On "deep" knowledge extraction from documents
نویسندگان
چکیده
SYNDIKATE comprises a family of natural language understanding systems for automatically acquiring knowledge from real-world texts (e.g., information technology test reports, medical finding reports), and for transferring their content to formal representation structures which constitute a corresponding text knowledge base. We present a general system architecture which integrates requirements from the analysis of single sentences, as well as those of referentially linked sentences forming cohesive texts. Properly accounting for text cohesion phenomena is a prerequisite for the soundness and validity of the generated text representation structures. It is also crucial for any information system application making use of automatically generated text knowledge bases in a reliable way.
منابع مشابه
Extraction of Informative Expressions from Domain-specific Documents
What kinds of lexical resources are helpful for extracting useful information from domain-specific documents? Although domain-specific documents contain much useful knowledge, it is not obvious how to extract such knowledge efficiently from the documents. We need to develop techniques for extracting hidden information from such domain-specific documents. These techniques do not necessarily use ...
متن کاملFeasibility Study for Procedural Knowledge Extraction in Biomedical Documents
We propose how to extract procedural knowledge rather than declarative knowledge utilizing machine learning method with deep language processing features in scientific documents, as well as how to model it. We show the representation of procedural knowledge in PubMed abstracts and provide experiments that are quite promising in that it shows 82%, 63%, 73%, and 70% performances of purpose/soluti...
متن کاملKnowledge Extraction from Web Documents Using Self- Organizing Neural Networks
Knowledge discovery is defined as non-trivial extraction of implicit, previously unknown and potentially useful information from given data [1]. Knowledge extraction from web documents deals with unstructured, free-format documents whosenumberisenormousandrapidlygrowing.
متن کاملA Framework for Extracting Biological Relations from Different Resources
The World Wide Web provides a vast source of information of almost all types. Biological data specifically have increased dramatically in the past years because of the exponential growth of knowledge in biological domain. It is very difficult to search for the required data in unstructured documents. Text documents often hide valuable structured data. This data can be exploited if available as ...
متن کاملSampling, information extraction and summarisation of Hidden Web databases
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users’ queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from database...
متن کاملAutomatic Extraction of Knowledge from Web Documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from m...
متن کامل